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© Improved expression and secretion of heterologous proteins in yeast employing truncated 
alpha-factor leader sequences. 

© A yeast a-factor expression system is provided comprised of a truncated leader sequence, containing the a- 
factor signal peptide and one glycosylation site, linked by a processing site to a non-yeast protein-encoding 
sequence. 
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IMPROVED EXPRESSION AND SECRETION OF HETEROLOGOUS PROTEINS IN YEAST EMPLOYING 

TRUNCATED ALPHA-FACTOR LEADER SEQUENCES 



Technical Field 



The present invention relates to the production of recombinant proteins in yeast. More particularly, the 
present invention is directed to an improved a-factor expression system which provides for the secretion of 
5 heterologous proteins from yeast host cells. 



Background 

io Kurjan et al. (1982) Cell 30:933-943 discloses the first cloning and sequencing of a gene encoding a 
yeast a-factor precursor gene. Kurjan et al., U.S. Patent No. 4,546,082, also reports the cloning of this gene, 
and suggests that the a-factor leader sequence can be employed to direct the secretion of heterologous 
proteins in yeast The patent, however, does not contain data which would indicate that the patentees ever 
successfully employed the a-factor leader to express and secrete a heterologous protein in yeast. 

15 EPO Publication No. 116,201 discloses the first successful application of the a-factor leader to direct the 
expression and secretion of an heterologous protein, epidermal growth factor, from a transformed yeast. 
Subsequent to this work, there have been additional reports of the expression of heterologous proteins in 
yeast employing the a-factor leader. See, e.g., Elliott et ai. (1983) Proc. Nat 1 ! Acad. Sci. USA 80:7080-7084; 
Bitter et al. (1984) Proc. Nat'l Acad. Sci. USA 81_:5330-5334; Smith et al. (1985) Science 229:1219-1229; 

20 EPO Publication Nos. 114,695; 123,228; 123,294; 123,544; 128,773; and 206,783. See also Brake et al. in 
Protein Transport and Secretion , p.103 (J.M. Gething ed. 1984). 

The expression systems in the above reports produce a full-length a-factor leader fused to a 
heterologous protein. While the above work demonstrates that the a-factor expression system is widely 
useful, it is not generally predictable prior to performing the experiment whether a particular heterologous 

25 protein will be successfully secreted, processed and biologically active. See, e.g., EPO 206,783, supra, pp. 
2-5; Rothblatt et ai. (1987) EMBO J. 6:3455-3463; V.L MacKay, "Secretion of Heterologous Proteins in 
Yeast" (in press). 

There have been several reports, based on unpublished data, that deletions from the "pro" region of the 
a-factor leader (between the signal peptide and the first spacer) causes substantial declines in the amount 

30 of non-yeast protein secreted by yeast transformed by the heterologous constructs. Sidu et al. (1987) Gene 
54:175-184, reports that yeast acid phosphatase (PH05) is secreted into the medium from a heterologous 
construct employing both a full-length a-factor leader, and a truncated a-factor leader, but that expression 
levels are 3-4X less than for the PH05 gene under the control of its homologous leader. It has also been 
reported that deletions in the prepro- a-factor precursor gene results in substantial declines in secretion of 

35 the native a-factor peptide. See, e.g., V.L MacKay, supra ; Rothblatt et al., supra . 

A need exists, therefore, to improve the a-factor expression system, particularly for applications to non- 
yeast proteins that are not efficiently produced by current a-factor expression constructs. 



40 Summary of the Invention 

It has been surprisingly discovered that a truncated form of the a-factor leader sequence can efficiently 
direct the expression and secretion of heterologous polypeptides in yeast. Particularly surprising is the 
discovery that truncated a-factor leader sequences can substantially improve the efficiency of expression of 
45 such heterologous proteins in relation to expression systems using the full-length a-factor leader; i.e., higher 
levels of correct N-terminal processing, secretion of heterologous proteins wherein a greater percentage of 
the molecules are biologically active, etc. These results are particularly surprising in view of reports that 
deletions from the leader sequence of the a-factor precursor result in decreased levels of secretion of active 
a-factor. 

so The present invention provides, therefore, for alternative a-factor-based expression constructs, which 
are particularly useful for the expression of heterologous polypeptides which are either inefficiently 
expressed by full-length a-factor leader constructs, or are not expressed at all by such full-length 
. constructs. 

In one embodiment, the present invention is directed to a yeast cell comprising a DNA construct that 
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provides for the expression and secretion of a non-yeast protein, said DNA construct comprising a coding 
sequence under the control of yeast-recognized transcription initiation and termination sequences, said 
coding sequence encoding a precursor polypeptide comprised of a leader sequence and said non-yeast 
protein linked by a processing site that provides for the cleavage of said non-yeast protein from said 
5 precursor polypeptide, wherein said leader sequence is about 25 to about 50 N-terminal residues of said 
precursor polypeptide and comprises the signal peptide of a yeast a-factor precursor and a single 
glycosylation site. 

In another embodiment, the present invention is directed to a double-stranded DNA molecule compris- 
ing a region encoding a precursor polypeptide secretable by a yeast host, said region, with reference to one 
w of the DNA strands, having the structure: 

5'-AF-CHO-X n -S-Gene*-3' 
wherein 

15 AF encodes a yeast a-factor signal peptide; 
CHO encodes a glycosylation site; 

X n encodes a polypeptide of n amino acids in length that does not contain a glycosylation site or a 
processing site that provides for cleavage of said precursor polypeptide in vivo by yeast; 
n is an integer from 0 to about 30; 
20 Gene* encodes a non-yeast protein; and 

S encodes a processing site that provides for cleavage of said precursor polypeptide. 

The present invention is also directed to methods of employing the above cells and DNA constructs to 
produce recombinant proteins, as well as the compositions of recombinant proteins produced by the above 
methods. Other embodiments will also be readily apparent to those of ordinary skill in the art. 

25 

Description of the Figures 

Figure 1 is a flow diagram showing the construction of pYBCAS, and both the nucleotide and amino 
30 acid sequences of the synthetic proinsulin gene employed in Example I. The various synthetic 
oligonucleotides used in construct are delineated by black dots. Arrows above the sequence show the 
beginning and end of the B, C and A proinsulin chains. The boxes mark the dibasic processing sites. 

Figure 2 is the nucleotide and amino acid sequence of a synthetic oligonucleotide encoding a 
modified a-factor leader and the first 13 amino acids of the proinsulin gene used to construct pYGAI3 in 
35 Example I. The modified a-factor leader has had the glycosylation sites removed by changing the codons 
for Asn23,57,67 to encode Gin (boxed). The arrow denotes the junction between the sequence encoding the 
KEX2 endopeptidase site and the N-terminus of human proinsulin. 

Figure 3 shows the synthetic gene of fragment pYGAIC3 encoding the proinsulin analog, where a 
KEX2 endopeptidase site has replaced the C peptide (boxed). The synthetic 133 bp fragment referred to in 
40 Example II is defined by the vertical and horizontal lines thru the nucleotide sequence. 
Figure 4 is a restriction map of yeast shuttle vector pAB24. 

Figure 5 shows the DNA sequence of the synthetic gene encoding IGFI described in Example III. 

Figure 6 is a restriction map of pYLUIGFI-55, an expression vector described in Example ill encoding 
IGFI under the control of a truncated a-factor leader. 
45 Figure 7 is a restriction map of pYLUIGFI-24, an expression vector described in Example III encoding 

IGFI under the control of a full-length a-factor leader with three glycosylation sites. 



Detailed Description 

50 

The practice of the present invention will employ, unless otherwise indicated, conventional molecular 
biology, microbiology, and recombinant DNA techniques within the skill of the art. Such techniques are fully 
explained in the literature. See, e.g., Maniatis, Fritsch & Sambrook, Molecular Cloning: A Laboratory Manual 
(1982); DNA Cloning , Vols. I & II (D.N. Glover, ed. 1985); OligonucieotideTynthesis (M.J. Gait, ed. 1984); 
55 Transcription and Translation (B.D. Hames & SJ. Higgins, eds. 1984); immobilized Cells and Enzymes (IRL 
Press, 1986); B. Perbal, A Practical Guide to Molecular Cloning (1984). 

In describing the present invention, the following terminology will be used in accordance with the 
definitions set out below. 
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A "replicon" is any genetic element (e.g., plasmid, cosmid, chromosome, virus) that functions as an 
autonomous unit of DNA replication in vivo-i.e., capable of replication under its own control. 

A "vector" is a replicon such as a plasmid, phage, or cosmid to which another DNA segment may be 
attached so as to bring about the replication of the attached segment. 

5 A "double-stranded DNA molecule" refers to the polymeric form of deoxyribonucleotides (adenine, 
guanine, thymidine, or cytosine) in its normal, double-stranded helix. This term refers only to the primary 
and secondary structure of the molecule, and does not limit it to any particular tertiary forms. Thus, this 
term includes double-stranded DNA found, inter alia, in linear DNA molecules (e.g., restriction fragments), 
viruses, plasmids, and chromosomes. In discussing the structure of a particular double-stranded DNA 

10 molecule, sequences will be described herein according to the normal convention of giving only the 
sequence in the 5 to 3' direction along the nontranscribed strand of DNA, i.e., the strand having a 
sequence homologous to the mRNA produced from a particular coding sequence. 

A DNA "coding sequence" is DNA sequence which can be transcribed and translated into a polypep- 
tide in vivo when placed under the control of appropriate regulatory sequences. The boundaries of the 

75 coding sequence are determined by and include the translation start codon at the 5 (amino) terminus, and 
a translation stop codon at the 3 (carboxy) terminus. A coding sequence can include, but is not limited to, 
procaryotic DNA sequences, viral DNA sequences, cDNA or genomic DNA sequences from eucaryotic 
sources (e.g., mammalian), and even synthetic DNA sequences. 

"Yeast-recognized transcription initiation and termination sequences" refer to DNA regulatory regions 

20 which flank a coding sequence and are responsible for the transcription in yeast of an mRNA homologous 
to the coding sequence which can then be translated into the polypeptide encoded by the coding 
sequence. 

Transcription initiation sequences include yeast promoter sequences, which are DNA regulatory sequences 
capable of binding yeast RNA polymerase in a cell and initiating transcription of a downstream (3' direction) 

25 coding sequence. For the purposes of defining the present invention, the promoter sequence is bounded 
(and excludes) at its 3 terminus by the translation start codon of a coding sequence and extends upstream 
(5' direction) to include the minimum number of nucleotides or elements necessary to initiate transcription 
at levels detectable above background. Within the promoter sequence will be found a transcription initiation 
site (conveniently defined by mapping with nuclease St), as well as protein-binding domains (consensus 

30 sequences) responsible for the binding of the yeast RNA polymerase. Promoters useful in the present 
invention include the wild-type a-factor promoter, as well as other yeast promoters. Particularly preferred 
are promoters involved with the enzymes in the glycolytic pathway, e.g., phosphoglucoisomerase, 
phosphofructokinase, phosphotrioseisomerase, phosphoglucomutase, enolase, pyruvic kinase, 
glyceraldehyde-3-phosphate dehydrogenase, alcohol dehydrogenase, as well as hybrids of these promot- 

35 ers. See, e.g., EPO Publication Nos. 120,551; 164,556. Transcription initiation sequences can also include 
other regulatory regions responsible for promoter regulation or enhancement. In like manner, a transcription 
terminator sequence located 3 to the translation stop codon can be either the wild-type a-factor transcrip- 
tion termination sequence, or another yeast-recognized termination sequence, such as those from the genes 
for the above glycolytic enzymes. 

40 A coding sequence is "under the control" of transcription initiation and termination sequences when 
RNA polymerase binds the transcription initiation sequence and transcribes the coding sequence into 
mRNA terminating at the transcription termination sequence, and the mRNA is then translated into the 
polypeptide encoded by the coding sequence (i.e., "expression"). The precursor polypeptide encoded by 
the coding sequences of the present invention is "secreted" when at least a portion (usually the non-yeast 

45 protein in the absence of the leader sequence) is transported extracellularly where it is found in the cell 
growth medium. Usually, only the portion of precursor protein downstream from the leader sequence is 
secreted, and this downstream portion may also be subjected to additional processing during secretion, 
such as proteolytic cleavage, glycosylation, folding, disulfide bond formation, etc. 

A cell has been "transformed" by exogenous DNA when such exogenous DNA has been introduced 

so inside the ceil wall. Exogenous DNA may or may not be integrated (covalently linked) to chromosomal DNA 
making up the genome of the cell. The exogenous DNA may be maintained extrachromosomally on a 
replicon such as a plasmid. When the exogenous DNA has become integrated to the chromosome, it is 
inherited by daughter cells through chromosome replication. A cell which has been transformed by 
exogenous DNA which is integrated into the chromosome is referred to as a "stably" transformed cell. A 

55 "clone" or "clonal population" is a population of cells derived from a single cell or common ancestor by 
mitosis. 

Two DNA sequences are "substantially homologous" when at least about 60% (preferably at least 
about 75%, and most preferably at least about 90%) of the nucleotides match over a defined length of the 
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molecules. Sequences that are substantially homologous can be Identified in a Southern hybridization 
experiment under conditions of a selected stringency as defined for that particular system. Defining 
appropriate hybridization conditions is within the skill of the art. See, e.g., Maniatis et al., supra ; DNA 
Cloning , supra ; Nucleic Acid Hybridization , supra , 

5 A "heterologous region" of a DNA molecule is an identifiable segment of DNA within a larger DNA 
molecule that is not found in association with the larger molecule in nature. Thus, when the heterologous 
region encodes a mammalian protein, the heterologous region will usually be flanked by DNA that does not 
flank the mammalian DNA sequence in the genome of the source organism. Another example of a 
heterologous coding sequence is a construct where the coding sequence itself is not found in nature (e.g., a 

w cDNA where the genomic coding sequence contains introns, or synthetic sequences having codons 
different from organisms which encode the same or similar protein). Allelic variations or naturally occurring 
mutational events do not give rise to a "heterologous" region of DNA as used herein. 

As used herein, "yeast" includes ascosporogenous yeasts (Endomycetales), basidiosporogenous yeasts 
and yeast belonging to the Fungi imperfecti (Blastomycetes). The ascosporogenous yeasts are divided into 

15 two families, Spermophthoraceae and Saccharomycetaceae. The latter is comprised of four subfamilies, 
Schizosaccharomycoideae (e.g., genus Schizosaccharomyces), Nadsonioideae, Lipomycoideae and Sac- 
charomycoideae (e.g., genera Pichia, Kluyveromyces and Saccharomyces). The basidiosporogenous yeasts 
include the genera Leucosporidium, Rhodosporidium, Sporidiobolus, Filobasidium and Filobasidiella. Yeast 
belonging to the Fungi Imperfecti are divided into two families, Sporobolomycetaceae (e.g., genera 

20 Sporoboiomyces, Bullera) and Cryptococcaceae (e.g., genus Candida). Of particular interest to the present 
invention are species within the genera Pichia, Kluyveromyces, Saccharomyces, Schizosaccharomyces and 
Candida. Of particular interest are the Saccharomyces species S. cerevisiae, S. carlsbergensis , S. 
diastaticus , S. douglasii , S. kluyveri , S. norbensis and S. ovifomnis .~Species of particular interest in the 
genus Kluyveromyces include K. lactis. Since the classification of yeast may change in the future, for the 

25 purposes of this invention, yeast shall be defined as described in Biology and Activities of Yeast (F.A. 
Skinner, S.M. Passmore & R.R. Davenport eds. 1980) (Soc. App. Bacteriol. Symp. Series No. 9). In addition 
to the foregoing, those of ordinary skill in the art are presumably familiar with the biology of yeast and the 
manipulation of yeast genetics. See, e.g., Biochemistry and Genetics of Yeast (M. Bacila, B.L Horecker & 
A.O.M. Stoppani eds. 1978); The Yeasts (A.H. Rose & J.S. Harrison ids., 2nd ed., 1987); The Molecular 

30 Biology of the Yeast Saccharomyces (Strathern et al. eds. 1981). The disclosures oflhe foregoing 
references are incorporated herein by reference. 

The present invention employs truncated leader sequences from a yeast a-factor gene, a-factor is an 
oligopeptide mating pheromone about 13 residues in length produced from a large precursor polypeptide 
between about 100 and 200 residues (typically about 120-160) in length (prepro-a-factor). The precursor is 

35 comprised of a hydrophobic "signal sequence" of about 20 residues (e.g., about 19-23, typically about 20- 
22) followed by an additional leader region of about 60 hydrophilic residues (the "pro" region), which is then 
linked to several tandem repeats of the mature pheromone sequence (typically about 2-6) separated by 
short oligopeptide spacer regions (typically about 6-8 residues) which provide for proteolytic processing to 
the mature pheromone. 

40 The cloning of various prepro-a-factor genes has been reported. See, e.g., Kurjan et al., U.S. Patent No. 
4,546,082; Singh et al. (1983) Nucleic Acids Res. 11:4049-4063; commonly owned U.S. Patent Application 
Serial No. 078,551, filed 28 July 1987, the disclosures of which are incorporated herein by reference. In 
addition, DNA sequences encoding the prepro-a-factor gene can be identified by hybridization with probes 
from known prepro-a-factor sequences. See, e.g., Brake et al. (1983) Molec. & Cell Biol. 3:1440-1450. a- 

45 factor may also be purified from a yeast species, sequenced and probes designed to clone the prepro-a- 
factor gene. See, e.g., McCullough et al. (1979) J. Bacteriol. 138:146-154; Sato et al. (1981) Agric. Biol. 
Chem. 44:1451-1453; Singh et al. (1983), supra , it has also been determined that the a-factor leader 
sequence from one yeast species can be functional in another yeast species. See, e.g., U.S.S.N. 078,551, 
• supra. Thus, the present invention contemplates not only the use of a-factor leader sequences from yeast in 

so general, but the use of such leader sequences in heterologous yeast species. For ease of presentation, 
however, the invention will be discussed in terms of the prepro-a-factor gene MFal from S. cerevisiae . See, 
e.g., Kurjan et al., U.S. Patent No. 4,546,083; Singh et al. (1983), supra . ~~ 

The present invention employs chimeric DNA constructs encoding hybrid precursor polypeptides 
comprised of a leader sequence and a non-yeast polypeptide. For purposes of this invention, the leader 

55 sequence DNA is defined as beginning at the N-terminal start codon (methionine) of the precursor 
polypeptide through the codon encoding the last amino acid residue before the processing site that 
intervenes between the leader sequence and the sequence encoding the non-yeast protein. The leader 
sequence of the present invention is comprised of a truncated form of a yeast a-factor leader sequence, 
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typically about 25 to about 50 amino acid residues in length. Thus, the leader sequence of the present 
invention is approximately 30 amino acid residues shorter than the typical full-length a-factor leader. MFal, 
for example, contains a leader sequence of 83 amino acid residues followed by a hexapeptide spacer 
sequence which is cleaved by yeast processing enzymes. In making deletions from the leader sequence, it 

5 is important that at least one glycosylation site (-Asn-Y-Thr/Ser-) is retained to provide for efficient secretion. 
It is also necessary that the leader retain a functional a-factor signal sequence. As indicated above, the 
signal peptide is usually about 20 amino acids in length, and characteristic features including a hydrophobic 
core. See, e.g., von Heijne, (1984) J. Mol. Biol. 173:243-251. All of the prepro-a-factor sequences examined 
today encode for a hydrophobic peptide of about 20 residues in length. While the exact length of a signal 

w peptide necessary to direct the precursor polypeptide to the secretory pathway is not defined, it will usually 
require between about 19 and about 23 residues, the minimum sequence required being readily definable 
by the testing of deletion mutants. 

Thus, with reference to MFal, deletions within the range of about 30 to about 60 residues, typically 
between about 33 and about 58 residues, and more typically between about 48 and about 58 residues, is 

75 contemplated by the present invention. These deletions will generally occur in the region between and 
including residues 26 through 83, It is preferred that the deletions include the glycosylation sites at residues 
57-59 and 67-69. The deleted a-factor leader sequences may be replaced, in part, by non-a-factor leader 
sequences, if desired. The sequences should generally encode hydrophilic amino acid residues, should not 
encode glycosylation or processing sites; and preferably should be selected to maintain the overall length 

20 of the leader to be about 50 residues or less, preferably about 23 to 40 residues, and most preferably about 
25 to 35 residues. 

As indicated above, the leader sequence of the present invention has immediately 3' thereto a 
processing site which allows for the cleavage of the leader from the non-yeast protein sequence to which it 
is fused in the precursor polypeptide. The processing site employed in the present invention is defined as 

25 the codons defining the minimum number of amino acid residues which are specifically recognized for 
cleavage by the selected process (e.g., chemical, enzymatic, etc.). Various processing sites are known in 
the art, including both those active in vivo and in vitro. For example, the processing site may provide for in 
vitro processing by encoding a cleavage site for a proteolytic enzyme which does not occur in the yeast 
host. The recovered precursor polypeptide would then be treated with the enzyme to cleave the non-yeast 

30 protein from the precursor. Another in vitro processing site is a methionine codon which can be cleaved by 
post-expression treatment with cyanogen bromide. See, e.g., U.S. Patent No. 4,366,246. 

In vivo processing sites can be selected from any peptide signals recognized by a yeast proteolytic 
enzyme which will provide for expression of the desired non-yeast protein sequence. Particularly preferred 
processing sites are those for the enzymes involved in processing of native prepro-a-factor. For example, 

35 dipeptidyl aminopeptidase A (DPAPase A) removes terminal -X-Ala- sequences, where X is Glu or Asp. 
See, e.g., Julius et al. (1983) Cell 32:839-852. The endopeptidase encoded by the KEX2 gene cleaves basic 
dipeptides comprised of Lys and Arg residues: i.e., Lys-Arg, Arg-Arg, Arg-Lys and Lys-Lys. Fuller et al., 
Microbiology 1986 , pp. 273-278 (1986). In yeast, the a-factor precursor is first cleaved by the KEX2 
endopeptidase, and then the N-termini are trimmed by DPAPase A to provide mature a-factor pheromone. 

40 Since it appears that the latter proteolytic process is a rate limiting step, it is preferred to eliminate the 
signals for DPAPase A, such that the processing site is comprised only of the signal for the KEX2 
endopeptidase. In such an embodiment, therefore, the leader sequence will be joined to the non-yeast 
protein sequence by the dibasic peptide recognition site for KEX2 endopeptidase, such as Lys-Arg or Arg- 
Arg. 

45 The carboxy-terminal portion of the precursor polypeptide of the present invention is a non-yeast 
protein. The DNA sequence encoding this portion is defined herein as beginning with the first codon 
downstream (3 direction) from the last codon of the processing site through to the translation stop codon 
which defines the carboxy-terminal of the precursor polypeptide. This DNA sequence will be considered to 
encode a "non-yeast protein" when, over its entire sequence, it defines a polypeptide that is not 

so substantially homologous to a polypeptide expressed by yeast. In general, the preferred non-yeast proteins 
will be mammalian protein sequence (including their analogs; i.e., "muteins", fragments, etc.). As defined 
herein, non-yeast proteins can include, therefore, a fusion protein comprised of mammalian and yeast 
sequences, as well as "pro" forms of the mature mammalian protein. 

DNA sequences encoding the non-yeast proteins can be sequences cloned from non-yeast organisms, 

55 or they can be synthetic sequences, usually prepared using yeast-preferred codons. Usually, the non-yeast 
proteins will be at least about 8 amino acids in length and can include polypeptides up to about 100,000 
daitons or higher. Usually, the non-yeast polypeptide sequence will be less than about 300,000 daltons, and 
r^ore usually less than about 150,000 daltons. Of particular interest are polypeptides of from about 5,000 to 
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about 150,000 daltons, more particularly of about 5,000 to about 100,000 daltons. Illustrative non-yeast 
proteins of interest include hormones and factors, such as growth hormone, somatomedins, epidermal 
growth factor, luteinizing hormone, thyroid-stimulating hormone, oxytocin, insulin, vasopressin, renin, cal- 
citonin, follicle-stimulating hormone, prolactin, erythropoietin, colony-stimulating factors, lymphokines such 
5 as interleukin-2, globins, immunoglobulins, interferons (e.g., a, b or q), enzymes, b-endorphin, enkephalin, 
dynorphin, insulin-like growth factors, etc. 

In a preferred embodiment of the present invention, DNA constructs encoding the above-described 
precursor polypeptides have the structure: 

w 5'-AF-CHO-X n -S-Gene*-3' 

wherein AF encodes a yeast alpha-factor signal peptide; CHO encodes a glycosylation site; X n encodes a 
polypeptide of n amino acids in length that does not contain a glycosylation site or processing site that will 
cause the precursor polypeptide to be cleaved in vivo by the yeast host; n is an integer from 0 to about 30; 
75 Gene* encodes a non-yeast protein; and S encodes a processing site that provides for cleavage of said 
precursor polypeptide. 

The signal peptide encoded by AF is the same a-factor signal peptide described above. It is 
approximately 20 residues in length (e.g., about 19-23) and is of sufficient length to direct the precursor 
polypeptide into the yeast secretory pathway. The precise minimum or maximum length can be determined 
20 for a particular a-factor by screening a series of deletion constructs. 

The DNA sequence defined by CHO encodes a glycosylation site. It will generally be nine nucleotides 
in length, including three codons for the amino acids Asn-Y-Y wherein Y is any amino acid residue, and Y 
is Thr or Ser. 

X n , if present, encodes, for example, portions of the a-factor leader which are not deleted or unrelated 
25 amino acid sequences. In general, X n will be a maximum of about 30 amino acid residues, more preferably 
a maximum of about 20 residues, and most preferably a maximum of about 10 residues. While it may not 
be necessary for X n to encode any polypeptides (i.e., n=0), it may be desired to provide some spacing 
between the glycosylation site CHO, and the processing site S in the event that carbohydrate additions at 
the glycosylation site sterically hinder access of the agent which cleaves the processing sites. In such case, 
30 n will usually be a minimum of about 1, more preferably a minimum of about 2, while most preferably a 
minimum of about 3. 

It is preferred that X n , if present, not contain any functional glycosylation sites or processing sites 
recognized and cleaved by the yeast host. Further, when departing from sequences found in an a-factor 
leader, it is preferred to select hydrophilic amino acid residues. It is possible that the length of X n will affect 
35 the efficiency of expression and secretion of the non-yeast protein. Selection of the appropriate length of X n 
to optimize expression can be done through screening constructs of various sizes. 

The non-yeast protein encoded by Gene* and the processing site encoded by S are as described 
above. 

The DNA constructs of the present invention will normally be maintained in a replicon capable of stable 

40 maintenance in a host, particularly a yeast host. The replicons, usually piasmids, will include one or more 
replication systems, desirably two replication systems, allowing for maintenance of the replicon in both a 
yeast host for expression, and in a procaryotic host for cloning. Examples of such yeast-bacteria shuttle 
vectors include YEp24 [Botstein et al. (1979) Gene 8:17-24], pCl/1 [Brake et al. (1984) Proc. Natl. Acad. Sci. 
USA 81^:4642-4646], and YRp17 [Stnichomb et al.""(1982) J. Mol. Biol. 158:157]. Furthermore, a plasmid 

45 expression vector may be a high or low copy number plasmid, the copy number generally ranging from 
about 1 to about 200. With high copy number yeast vectors, there will generally be at least 10, preferably at 
least 20, and usually not exceeding about 150 copies in a single host. Depending upon the non-yeast 
protein selected, either a high or low copy number vector may be desirable, depending upon the effect of 
the vector and the foreign protein on the host. See, e.g., Brake et al., supra. DNA constructs of present 

so invention can also be integrated into the yeast genome by an integrating vector. Examples of such vectors 
are known in the art. See, e.g., Botstein et al., supra . 

The selection of suitable yeast and other microorganism hosts for the practice of the present invention 
is within the skill of the art. When selecting yeast hosts for expression, suitable hosts may include those 
shown to have, inter alia, good secretion capacity, low proteolytic activity, and overall robustness. Yeast and 

55 other microorganisms are generally available from a variety of sources, including the Yeast Genetic Stock 
Center, Department of Biophysics and Medical Physics, University of California, Berkeley, California; and 
the American Type Culture Collection, Rockville, Maryland. 

Methods of introducing exogenous DNA into yeast hosts are well known in the art. There is a wide 
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variety of ways to transform yeast. For example, spheroplast transformation is taught, for example, by 
Hinnen et ai. (1978) Proc. Natl. Acad. Sci. USA 75:1919-1933, and Stinchcomb et al., EPO Publication No. 
45,573. Transformants are grown in an appropriate nutrient medium, and, where appropriate, maintained 
under selective pressure to insure retention of endogenous DNA. Where expression is inducible, growth can 
5 be permitted of the yeast host to yield a high density of cells, and then expression is induced. The 
secreted, processed non-yeast protein can be harvested by any conventional means, and purified by 
chromatography, electrophoresis, dialysis, solvent-solvent extraction, and the like. 



io Examples 



The following examples are provided for illustrative purposes only, and are not intended to limit the 
scope of the present invention. It is believed that the deposit of the starting biological materials is not 
15 necessary for the practice of the present invention since either the same or equivalent materials are publicly 
available. 



I. 

20 

The following example provides a comparison of the levels of expression and secretion obtained with 
modified a-factor constructs used to express human proinsulin. Three constructs employ full-length a-factor 
leaders; one having a-factor leader with the three native glycosylation sites, one having all three of the 
25 glycosylation sites eliminated, and one having all of the sites, except the one at Asn23, removed. The fourth 
construct is a truncated a-factor leader which retains a single glycosylation site at Asna3. 



A. pYGAH 

30 

This plasmid encodes an a-factor leader [Brake et al. (1984) Proc. Natl. Acad. Sci. USA 81:4642-4646; 
EPO Publication No. 116,201] linked to human proinsulin. The proinsulin is encoded by a synthetic gene 
made with yeast preferred codons (Figure 1). The a-factor leader sequence, the synthetic proinsulin gene 
and the a-factor terminator sequence are from pYBCA5, the construction of which is shown in Figure 1 . 

35 Transcription is mediated by the 404 bp BamHI-Ncol GAPDH promoter fragment. Travis et al. (1985) J. Biol. 
Chem. 260:4384-4389. The 1206 bp BamHI expression cassette consisting of the GAPDH promoter, the 
sequence encoding the a-factor leader linked to proinsulin and the a-factor terminator was cloned into the 
unique BamHI site of the yeast shuttle vector pAB24 (below) or pCI/1 such that the GAPDH promoter 
sequence was proximal to the Sail site of the vector to give the plasmids pYGAI1-AB24 or pYGAI1-CI/1, 

40 respectively. The 1206 bp BamHI expression cassette was also subcloned into the unique BamHI site of a 
derivative of pBR322 [pBR322(WEcoRI-Sall)BamHI; Travis et al. supra .] This plasmid was called pGAI1. 

Plasmid pAB24 (Figure 4) is a yeast shuttle vector which contains the complete 2m sequence [Broach, 
m: Moiecuiar B'Qlogy of the Yeast Saccharomyces , Vol., 1, p. 445 (1981)] and pBR322 sequences. It also 
contains the yeast URA3 gene derived from plasmid YEp24 [Botstein et al. (1979) Gene 8:17] and the yeast 

45 LEU2d gene derived from plasmid pCI/1. EPO Publication No. 116,201. Plasmid pAB24 was constructed by 
digesting YEp24 with EcoRI and religating the vector to remove the partial 2m sequences. The resulting 
plasmid, YEp24WRI, was linearized by digestion with Clal and ligated with the complete 2m plasmid which 
had been linearized with Clal. The resulting plasmid, pCBou, was then digested with Xbal and the 8605 bp 
vector fragment was gel isolated. This isolated Xbal fragment was ligated with a- 4460 bp Xbal fragment 

so containing the LEU2d gene isolated from pCI/1 ; the orientation of the LEU2d gene is in the same direction 
as the URA3 gene. 



B. pYGAI3 

Plasmid pYGAI3 differs from pYGAH in that it encodes a modified a-factor leader wherein the codons 
for Asn at residues 23, 57 and 67 have been changed to encode Gin, thereby eliminating ail three signals 
for N-linked glycosylation. 



8 



EP 0 324 274 A1 



The a-factor leader and the N-terminal 13 amino acids of proinsuiin encoded by this plasmid were 
constructed by ligation of synthetic oligonucleotides to give a 294 bp fragment with a 5 IMcoi overhang and 
a 3' Hindlll overhang, the sequence which is shown in Figure 2. The sequence of appropriate 
oligonucleotides was altered during the synthesis so that codons which specified Asn at positions 23, 57 
5 and 67 of the natural a-factor leader now specified Gin at the same positions. The DNA sequence 
specifying the N-terminal 13 amino acids of proinsuiin was identical to that in pYGAH. The 294 bp synthetic 
DNA (Ncol-Hindlll) fragment was substituted for the comparable fragment of pGAH and pYGAI1-CI/1 which 
gave the plasmids pGAI3 and pYGAI3, respectively. 

10 

C. pYGAI8 

Plasmid pYGAI8 contains DNA encoding an a-factor leader which eliminates two of the three glycosyla- 
te sites. Asn 5 7 t67 have been modified to Gln 57i67 . The resulting plasmid has only a single glycosylation site 
75 at position Asn23. pYGA!8 was prepared as follows. 

First, a 5 fragment was isolated from the expression cassette of pGAH by cutting with Hpali, followed 
by cutting with BamHI, and then gel isolating a 504 bp fragment containing the GAPDH promoter and the 
sequence encoding residues 1-33 of the a-factor leader. Next, plasmid pYGAI3 encoding an a-factor leader 
lacking glycosylation sites was also sequentially cut with Hpall and BamHI, and a 702 bp fragment isolated 
20 containing sequences encoding modified a-factor leader residues 34-83, the LysArg processing site, the 
proinsuiin sequence and the a-factor termination sequence. This fragment was then ligated to the 504 bp 
fragment from pGAH, cut with BamHI and a 1206 bp fragment isolated. 

The above 1.2 kb BamHI fragment which contained a complete GAPDH promoter/a-factor 
leader/proinsulin/a-factor terminator expression cassette was then ligated into BamHI-cut and phosphatase- 
25 treated pBR322(WEcoRI-Sall)BamHI to give plasmid pGAI8, which was cloned in E colL 

The 1.2 kb expression cassette from pGAI8 was removed by cutting with BamHI and then gel isolating 
the fragment. It was ligated into BamHI-cut and phosphatase-treated yeast shuttle vector pAB24. Insertion of 
the expression cassette was in the unique BamHI site of the pBR322 sequences such that the GAPDH 
promoter was proximal to the unique Sail site of the vector. This plasmid was pYGAI8. 

30 

D. pYGAI7 

Plasmid pYGAI7 contains the DNA encoding a truncated a-factor leader and the synthetic gene for 
35 human proinsuiin. The a-factor leader has been truncated so that it encodes only amino acids 1-35 of the a- 
factor leader and therefore contains a single site for glycosylation at Asn23. This yeast expression vector 
was constructed as follows. 

First, pGAH was cut with Hindlll. An Hpall-Hindlll linker was added of the following structure: 

40 5 -CGGCTAAAAGATTCGTTAACCAACACTTGTGTGGTTCTCACTTGGTTGA CGATTTTCTAAGCAATTGGT- 
TGTGAACACACCAAGAGTGAACCAACTTCGA-5' 

After adding the linker, the linearized plasmid was cut with BamHI, and a 558 bp Hpall-BamHI fragment was 
gel isolated. This fragment contains the codons for residues 34-35 of the a-factor leader linked directly to a 
45 Lys-Arg processing site and the proinsuiin sequence. There are no intervening sequences between the 
codon for residue 35 of the a-factor leader and the processing site directly adjacent to the proinsuiin 
sequence. 

Second, pGAH was cut with Hpall and BamHI, and a 504 bp fragment gel isolated. This fragment 
contains the GAPDH promoter and nucleotides encoding amino acids 1-33 of the a-factor leader, the 3 end 

50 terminating in an Hpall overhand complementary to the 5 end of the above-described 558 bp Hpall-BamHI 
fragment. These two fragments were ligated together and cut with BamHI to provide an expression cassette 
containing the GAPDH promoter, sequences encoding a modified a-factor leader containing residues 1-35 
directly linked to a Lys-Arg processing site, the proinsuiin gene, and the a-factor terminator. The cassette 
was then ligated into a BamHI site of pBR322(WEcoRI-Sall)BamHI, as described above, to give plasmid 

55 pGAI7 and cloned in E.coli . 

pGAI7 was then cut with BamHI, and the 1062 bp expression cassette gel isolated. The expression 
cassette was then ligated into the BamHI site of pAB24 to give plasmid pYGAI7. 
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E. Comparative Expression 

Plasmids pYGAI1-CI/1, pYGAI3, pYGAI1-AB24, pYGAI7 and pYGAIS were transformed into Sac- 
charomyces cerevisiae strain AB103.1 (Mata, leu2- 3,112, ura3- 52, his4- 58Q, pep4-3[cir° ]) essential ly as 
5 described by Hinnen et al. (1978) Proc. Natl. Acad. Sci. USA 75:1929-1933. Transformants of pYGAI1-CI/1 
and pYGAI3 were selected for leucine prototrophy, transformants of the other plasmids were selected for 
ura prototrophy. 

Data shown in Table 1 compares secretion of proinsuiin mediated by the natural a-factor leader 
(pYGAH-CI/1) or the a-factor leader with Gin substituted for Asn at positions 23, 57 and 67 (pYGA!3). 

ro Inoculum cultures (.2 ml of individual transformants) were grown for 48 hr in synthetic complete medium 
lacking leucine [SD-leu; Sherman et al M Methods in Yeast Genetics, p. 62 (Cold Spring Harbor Laboratory, 
1982)3 and diluted 20-fold into the same medium. Cultures were grown 48-72 hrs, culture supernatants were 
prepared by centrifugation and were assayed for immunoreactive cross-reacting insulin-like material (ILM) in 
a competition radioimmune assay with 125 l-labeled insulin. As can be seen in Table 1, elimination of the 

75 three glycosylation sites from the a-factor leader resulted in essentially no secretion of insulin-like-material 
compared to that mediated by the native a-factor leader. 

Data presented in Table 2 compares transformants of pYGAI1-pAB24 (full-length native a-factor leader, 
pYGAI8 (full-length a-factor leader with only one glycosylation site at Asn 23 ) and pYGAI7 (truncated a-factor 
leader containing a single glycosylation site at Asn 2 3) for their ability to secrete insulin-like-material. 

20 Inoculum cultures of the indicated transformants (.2 ml) in SD-Leu grown for .48 hr at 30° C were pelleted 
by centrifugation, washed and diluted 20-50 fold into ura~ medium. This medium contains 0.67% yeast 
nitrogen base, 1% succinic acid, 0.35% NaOH, 0.5% casamino acids, 2% glucose, 0.005% adenine, 0.01% 
tryptophan and 0.02% threonine. Cultures were grown at 30 *C for 48-72 hr, and culture supernatants 
prepared and assayed as described above. Data presented in Table 2 show that the transformants carrying 

25 the construct employing the truncated a-factor leader retaining a single glycosylation site at Asn 2 3 secreted 
generally more immunoreactive insulin-like-materiai than did transformants bearing the construct with the 
full-length native a-factor leader. Transformants bearing the construct with the full-length a-factor leader with 
the same single glycosylation site (Asn 2 3) secreted much less insulin cross-reactive material than did 
transformants bearing the full-length native a-factor leader or the truncated a-factor leader. 

30 

Table 1 



Effect of Elimination of a-Factor Leader Glycosylation Sites on 


Secretion of Insulin-Like-Material 




Transformant 




ODsso 


ILM 1 








mg/ml 


mg/ml,OD65o 


AB103.1[pYGAI1-C1/1] 


-1 


5.9 


.24 


.04 




-2 


5.9 


.24 


.04 




-3 


2.1 


.08 


.04 


AB103.1[pYGAl3] 


-1 


ND 


.003 






-2 


ND 


.007 





45 1) Cross-reactive insulin-like-material (ILM) as determined by a competition 

radioimmune assay with 125 Mabeled insulin and insulin standards. Data is reported 
as ILM secreted per ml of culture and in some cases as I M secreted per ml 
normalized to a culture ceil density with an absorbance at wavelength 650mm of 
1. 

50 



55 
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Table 2 



Effect of Truncated or Full-Length a-Factor Leader with a Single Glycosylation Site at Asn 2 3 

on Secretion of Insulin-Like-Material 



Transformant 


No. of 
Tests 1 


!LM 2 


mg/ml 


mg/ml ODsso 


Range 


Mean 


Std. 
dev. 


Range 


Mean 


Std. 
dev. 


AB103.1 [pYGAM-AB24] 


7 


.22-.55 


.37 


.12 


.02-.04 


.026 


.01 


AB103.1 [pYGA!7] 


8 


.38-1 .0 


.62 


.24 


.03-.06 


.043 


.01 


AB103.1 [pYGAI8] 


8 


.03-. 18 


.09 


.05 


.003-.01 


.007 


.003 



1) A minimum of three independent transformants were tested. 

2) Secreted cross-reactive insulin-like-material (ILM) was determined by a competition 
radioimmune assay with 125 l-labeled insulin and insulin standards. Data is reported as ILM 
secreted per ml of culture and as ILM s creted per ml normalized to a culture cell density with 
an absorbance at wavelength 650 mm of 1 . 



II. 



This example compares the expression of a full-length a-factor leader construct, retaining all glycosyia- 
tion sites, to an expression construct employing a truncated a-factor sequence retaining only a single 
glycosylation site at Asn23- The non-yeast protein employed in this example is a human proinsulin analog 
wherein the connecting "C" peptide has been replaced by a yeast KEX2 endopeptidase cleavage site. 



A. pYGA!C3 

The plasmid pGAIC3 was made by replacing the 231 bp Hindlll-Sall fragment of pGAH which encodes 
amino acids 14 through 30 of the B chain, the C-peptide, the A chain and 2 translation stop codons with a 
132 bp synthetic Hindlll-Sall gene fragment (shown in Figure 3) which encodes amino acids 14 through 30 
of the B chain, a Lys-Arg KEX2 endopeptidase cleavage site, the A chain, and translation stop codons. The 
plasmid pYGAlC3 was prepared from pGAIC3 as follows. 

Plasmid pGAIC3 was digested with BamHI, and the 1107 bp BamHI expression cassette containing the 
GAPDH promoter, the sequence encoding a-factor leader linked to proinsulin analog, and the a-factor 
transcription terminator was isolated and ligated into BamHI digested and phosphatase-treated pAB24, and 
then cloned in E. coii. Plasmid pYGAIC3 was obtained, in which the expression cassette was oriented such 
that the GAPDH promoter was proximal to the unique Sail site of the vector. 



B. pYaf L 7C3 

Plasmid pYaF{7C3 contains DNA encoding the truncated a-factor leader described above for pYGAI7 
linked to the sequence for the proinsulin analog, also described above (pYGAIC3). This plasmid was 
constructed as follows. 

First, pGAIC3 was cut with Hindlll and Sail, and a 132 bp fragment was gel isolated. This fragment 
contains sequences encoding all but the first 12 codons of the proinsulin analog. It was ligated into a gel 
isolated 4640 bp fragment from Hindlll- and Sall-digested pGAI7 to provide plasmid paf L 7C3. After cloning 
in E. coli, this plasmid was cut with BamHI and a 1062 bp BamHI fragment was gel isolated. This 
expTession cassette contains the truncated a-factor leader construct of pGAI7 with the proinsulin analog in 
pjace of the normal proinsulin sequence. The expression cassette was then ligated into the BamHI site of 
pAB24, as described above, to give pYaf L 7C3. 
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Comparative Expression 

Expression levels were determined for pYGAIC3 and pYaf L 7C3 in two strains of S. cerevisiae. Strain 
AB1G3.1 has been described in Example I. Strain AB110-4 is a derivative of Saccharbmyces cerevisiae 

5 strain AB110 (Mata, Ieu2, ura3- 52, pep4-3, his4-580[cir" ]) in which a deletion has been engineered into the 
pep4 gene. These strains were transformed as described above with plasmids pYGAIC3 and pYaf L 7C3, and 
ura prototrophs were selected. Inoculum cultures were grown in SD-leu [Sherman et aL, supra.] at 30 *C for 
24-48 hours then pelleted by centrifugation, washed and diluted 20 fold into ura" medium (described 
above) and grown for 48-72 hours at 30° C. Cell-free conditioned culture medium was prepared by 

10 centrifugation for assay in a competition insulin radioimmune assay. 

The results are shown in Table 3. As can be seen the truncated a-factor construct mediates increased 
secretion of immunoreactive proinsulin analog, compared to the natural a-factor leader sequence. 

Table 3 

75 



Secretion of ILM from a Proinsulin Analog Construct Mediated by a Truncated a-Factor 

Leader or Natural a-Factor Leader 



Transformant 


No. of 
Tests 1 


ILM 2 


mg/mi 


. mg/ml, ODsso 


Range 


Mean 


Std. 
dev. 


Range 


Mean 


Std. 
dev. 


AB103.1 [pYGAIC3] 


6 


1 -2.75 


1.66 


.65 


.11 -.20 


.14 


.04 


AB103.1 [pYaf L 7C3] 


6 


1 .5-6.63 


4.46 


2.15 


.14-.60 


.40 


.19 


AB110.4 [pYGAIC3] 


3 


1.15-1.4 


1.28 


.13 


.10-.12 


.11 


.01 


AB110.4 [pYaf L 7C3] 


3 


2.15-4.12 


3.46 


1.14 


.19-.38 


.31 


.10 



30 1) A minimum of three independent transformants were tested. 

2) Secreted cross-reactive insulin-like-material (ILM) was determined by a competition 
radioimmune assay with 125 Mabeled insulin and insulin standards. Results are reported as 
amounts of ILM per ml of culture and as amount secreted per ml normalized to a cell density 
with an absorbance at 650 mm wavelength = 1 . 



111. 



This example described the construction of a truncated a-factor expression vector which mediates 
increased expression levels of active insulin-like growth factor-1. 

First, a DNA sequence encoding a truncated a-factor leader and a coding sequence for IGF1 was 
prepared. A synthetic sequence was prepared by standard procedures employing an Applied Biosystems 
380A DNA synthesis machine according to manufacturers direction. Fourteen DNA sequences were 
synthesized ranging from 22 to 57 bases in length, purified by PAGE, and phosphorylated individually by 
T4 kinase in the presence of ATP. The sequences were then annealed and ligated by standard procedures. 

The sequence of the synthetic gene is shown in Figure 5. The purified synthetic gene fragment was 
cloned into Ncol/Sail digested pBS100 (described below). The resulting plasmid was called pBS100 Taf L 
IGF1. 

Plasmid pBS100 contains a yeast expression cassette cloned into a pBR322 derivative, pAB12. The 
expression cassette contains the hybrid ADH-2/GAPDH promoter and the GAPDH terminator flanking a non- 
essential gene segment. The ADH-2/GADPH promoter is a 1200 bp BamHI-Ncol fragment isolated from 
PJS103 (see below) and the GAPDH terminator is a 900 bp Sall-BamHI fragment isolated from plasmid 
pPAG1. EPO Publication No. 164,556. Plasmid pBS100 also contains a non-essential fragment between 
Ncol and Sail sites which is replaced by gene fragments of interest. The expression cassette can be 
removed from pBS100 by digestion with BamHI and cloned into yeast shuttle vectors for introduction into 
yeast cells. 
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Plasmid pJSl03, which contains the hybrid ADH-2/GAPDH promoter employed above, was constructed 
as follows. The ADH-2 portion of the promoter was constructed by cutting a plasmid containing the wild- 
type ADH2 gene from plasmid pADR2 [Beier et al. (1982) Nature 300:724-728] with restriction enzyme 
EcoR5, which cuts at position + 66 relative to the ATG start codon, as well as in two other sites in pADR2, 

5 outside of the ADH2 region. The resulting mixture of a vector fragment and two smaller fragments was 
reacted with Ba131 exonuclease to remove about 300 bp. Synthetic Xhol linkers were ligated onto the 
Ba1 31 -treated DNA. The resulting DNA linker vector fragment (about 5 kb) was separated from the linkers 
by column chromatography, cut with restriction enzyme Xhol, religated, and used to transform E. coli to 
ampicillin resistance. The positions of the Xhol linker were determined by DNA sequencing. One plasmid 

w which contained an Xhol linker within the 5' nontranscribed region of the ADH2 gene (position -232 from 
ATG) was cut with the restriction enzyme Xhol, treated with nuclease S1, and subsequently treated with the 
restriction enzyme EcoRI to create a linear vector molecule having 1 blunt end at the site of the Xhol linker 
and an EcoRI end. The GAPDH portion of the promoter was constructed by cutting plasmid pPGAP [EPO 
Publication No. 164,556] with the enzymes BamHI and EcoRI, followed by the isolation of the 0.4 Kbp DNA 

75 fragment. This purified fragment was then completely digested with the enzyme Alul and an approximately 
200 bp fragment was isolated. This GAPDH promoter fragment was ligated to the ADH-2 fragment present 
on the linear vector described above to give plasmid pJS103. 

A BamHI fragment was then isolated from pBS100 Ta fL IGF1. This fragment contains the ADH2/GAPDH 
promoter, a truncated a-factor leader (AA 1-25, 81-83) a LysArg processing site, a coding sequence for 

20 IGF1, and the GAPDH terminator sequence. This BamHI fragment was then cloned into pAB24, previously 
digested with BamHI. A positive clone was selected, and while initially called plasmid 18.5, it was 
subsequently named pYLUIGF1-55 . (See Figure 6.) 

A second expression vector, pYLUIGF1-24 was also prepared by analogous methods. A restriction map 
is shown in Figure 7. This vector is similar to pYLUIGF1-55, except that it has a full-length a-factor leader 

25 directing secretion with three giycosylation sites (compare Example LA.) and the a-factor terminator. 

Yeast strain AB110 (EPO Publication No. 164,556) was transformed with pYLUIGF1-55 and pYLUIGFI- 
24 by conventional spheroplasting techniques [Hinnen et aL (1978) Proc. Natl. Acad. Sci. USA 75:1919- 
1933], and expression compared. — 

The expression of IGF1 from AB110 (pYLUIGF1-55) and AB110 (pYLUIGF1-24) is non-constitutive. 

30 Induction of IGF1 expression was achieved by bringing about a low concentration of glucose in the growth 
medium. Under standard conditions, shake flask cultures (25 ml) fully utilize the glucose in the medium by 
18-24 hours post inoculation. Thus, 25 ml cultures of AB110 (pYLUIGF1-55) and AB110 (pXLUIGF1-24) 
were grown under standard conditions for 72 hours. Supernatant samples were taken at 49 and 72 hours 
post inoculation and assayed for IGF1 biological activity (RRA) and for immunoreactivity (RIA) with anti-IGF1 

35 antibodies. As can be seen, pYLUIGF1-55, with a truncated a-factor leader, secreted protein of which a 
substantially greater fraction was biologically active. Although pYLUIGF1-24 secreted more protein that 
showed reactivity with IGF1 antibodies, relatively little of this protein was biologically active. 

The results are shown in Table 4. The radioreceptor assay (RRA) measures the ability of IGF-1 to bind 
to its receptor. This is a measure of the biological activity of recombinant polypeptide since it is believed 

40 that IGF-1 exerts all of its activity through its receptor. The receptor assay is described in Marshall et al. 
(1974) J. Clin. Endorinol. Metab. 19:283-292. The radioimmunoassay (RIA) is a competitive assay that 
measures the amount of protein antigenically cross-reactive with native IGF-1, whether or not it is 
biologically active. The assay is described in Zapf et al. (1981) J. Clin. Invest. 68:1321-1330. 

45 
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Table 4. Secretion of IGF1 Mediated by a Truncated 

a-Factor Leader or a Natural a-Factor 
Leader 

Transf ormant 49 hrs 

72 hrs 

RRA — RI A — RRA RIA 

AB110 

(pYLUIGFl-55) 1.0 14 1.3 14 

AB110 

(pYLUIFGl-24) 1.3 54 2.7 66 

1 ) mg/ml 

2 ) mg/ml 



Deposit of Biological Materials 



The following expression vectors were deposited with the American Type Culture Collection (ATCC), 
12301 Parklawn Drive, Rockville, Maryland, U.S.A., and will be maintained under the provisions of the 
Budapest Treaty. The accession numbers and dates of deposit are listed below. 



Deposited Material 


ATCC 
Number 


Deposit 
Date 


E. coli (pYGAI7) 
E. coli (pYaf L 7C3) 
E. coli (pYLUIGFI-55) 


67597 
67596 
67595 


1 2/29/87 
12/29/87 
12/29/87 



These deposits are provided for the convenience of those skilled in the art. These deposits are neither an 
admission that such deposits are required to practice the present invention nor that equivalent embodiments 
are not within the skill of the art in view of the present disclosure. The public availability of these deposits is 
not a grant of a license to make, use or sell the deposited materials under this or any other patent. The 
nucleic acid sequences of the deposited materials are incorporated in the present disclosure by reference, 
and are controlling if in conflict with any sequences described herein. 

Although the foregoing invention has been described in some detail for the purpose of illustration, it will 
be obvious that changes and modifications may be practiced within the scope of the appended claims by 
those of ordinary skill in the art. 



Claims 

1 . A yeast cell comprising a DNA construct that provides ior the expression and secretion of a non- 
yeast protein, said DNA construct comprising a coding sequence under the control of yeast-recognized 
transcription initiation and termination sequences, said coding sequence encoding a precursor polypeptide 
comprised of a leader sequence and said non-yeast protein linked by a processing site that provides for the 
cleavage of said non-yeast protein from said precursor polypeptide, wherein said leader sequence is about 
25 to about 50 N-terminal residues of said precursor polypeptide and comprises the signal peptide of a 
yeast a-factor precursor and a single glycosylation site. 

2. The cell according to claim 1 wherein said non-yeast protein is a mammalian protein. 

3. The cell according to claim 2 wherein said mammalian protein is human insulin-like growth factor I or 
a precursor of human insulin. 
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4. The ceil according to claim 3 wherein said precursor of human insulin is human proinsulin or wherein 
said precursor of human insulin comprises insulin a chain and insulin b chain linked by a yeast-recognised 
processing site cleaved in vivo. 

5. The cell according claim 4 wherein said processing site is cleaved by the KEX2 gene product of 
5 Saccharomyces. 

6. The cell according to any one of claims 1 to 5 wherein said yeast cell is from the genus 
Saccharomyces. 

7. The cell according to claim 6 wherein said yeast cell is S. cerevisiae. 

8. The cell according claim 7 wherein said yeast a-factor precursor is S. cerevisiae MFa1 . 

10 9. A double-stranded DNA molecule comprising a region encoding a precursor polypeptide secretable 
by a yeast host, said region, with reference to one of the strands, having the structure: 

5'-AF-CHO-X n -S-Gene*-3' 

15 wherein 

AF encodes a yeast a-factor signal; 
CHO encodes a glycosylation site; 

X n encodes a polypeptide of n amino acids in length that does not contain a glycosylation site or a 
processing site that provides for cleavage of said precursor polypeptide in vivo by yeast; 
20 n is an integer from 0 to about 30; 

Gene* encodes a non-yeast protein; and 

S encodes a processing site that provides for cleavage of said precursor polypeptide. 

10. The DNA molecule according to claim 9 wherein AF encodes a polypeptide of about 19 to 23 amino 
acids in length. 

25 11. The DNA molecule according claim 10 or claim 11 wherein n is an integer from about 0 to about 20. 

12. The DNA molecule according to claim 11 wherein n is an integer from about 0 to ahout 10. 

13. The DNA molecule according to claim 12 wherein n is an integer from about 3 to about 10. 

14. The DNA molecule according any one of claims 9 to 1-3 wherein said yeast host is a Sac- 
charomyces. 

30 15. The DNA molecule according to any one of claims 9 to 14 wherein said yeast a-factor signal 
peptide is a Sacchromyces signal peptide. 

16. The DNA molecule according to any one of claims 9 to 15 wherein S encodes a processing site 
recognised in vivo by said yeast host. 

17. The DNA according to claim 16 wherein S encodes a dipeptide recognised by the KEX2 
35 endopeptidase. 

18. The DNA molecule according to claim 17 wherein said dipeptide is 5-Lys-Arg-3' or 5-Arg-Arg-3'. 

19. The DNA molecule according to any of claims 9 to 18 comprising a replicon. 

20. The DNA molecule according to claim 19 wherein said region encoding said precursor polypeptide 
is under the control of yeast-recognised transcription initiation and termination sequences, and said replicon 

40 is a yeast replicon. 

21. The DNA molecule according to claim 20 wherein said replicon is a plasmid or a chromosome. 
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Sequence of synthetic gene fragment of pYGAI3 



l_l_IIJ_ 
NLA111 ALU1 



I !_! 1 1 I 

HPAll TAQ1 
PST1 BBV-1 ALU1 



BBV-2 MB011-1 



FNU4H1 
SFAN1-1 
F0K1-2 
MNL1-1 
FNU4H1 
GSU1-1 



MAE 3 
DDE1 



JJJJI II L 

BBV-1 HINC11-1 
FNU4H1 HINDI 11 

MB011-1 ALU1 
XBA1 RSA1 
MAE1 
HINF1 
HPA1 



MetArgPheProSerllePheThrAlaValLeuPheAlaAlaSerSerAlaLeuAlaAla 
l^MSSAGATTTCCTTCAATTTTTACTGCAGTTTTATTCGCAGCATCCTCCGCATTAGCTGCT 
TA(^CTAAAGGAAGTTAAAAATGACGTCAAAATAAGCGTCGTAGGAGGCGTAATCGACGA 

1 NLA111, 24 PST1, 38 BBV FNU4H1, 41 SFAN1, 42 F0K1, 45 MNL1 
, 55 ALU1, 56 BBV FNU4H1, 60 GSU1, 

ProValJSIifchrThrT^ 

CTACAACAGAAGATGAAACGGCACAAATTCCGGCTGAAGCTGTCATCGGT 
GGTCAC^TICGATGTTGTCTTCTACTTTGCCGTGTTTAAGGCCGACTTCGACAGTAGCCA 



62 CCAGT 



122 



242 



80 MBOll, 101 HPAll, 109 ALU1, 120 MAE3, 



Tyrl^uAspLeuGluGlyAspPheAspValAlaValLeuProPheSeriGX] 

TACTTAGATTTAGAAGGGGATTTCGATGTTGCTGTTTTGCCATTTTCC 
ATGAATCTAAATCTTCCCCTAAAGCTACAACGACAAAACGGTAAAAGG 3T 



lerThrAsn 
lGCACAAAT 
'TCGTGTTTA 



124 DDE1, 144 TAQ1, 
AsnGlyLeuLeuPhelL 

182 aacggg ttattgtttatj 
ttgcccaataacaaata: 



^?hrThrIleAlaSerIleAlaAlaLysGluGluGlyVal 

- ".CTACTATTGCCAGCATTGCTGCTAAAGAAGAAGGGGTA 
(MTGATGATAACGGTCGTAACGACGATTTCTTCTTCCCCAT 



221 BBV FNU4H1, 230 MBOll, 

SerLeuAspLysArg^heValAsnGlnHisLeuCysGlySerHisI^uValGliiAlaLeu 

TCTCTAGATAAAAGATTCGTTAACCAACACTTGTGTGGTTCTCACTTGGTTG^GCTTTO 
AGAGATCTATTTTCTAAGG^TTGGTTGTGAA 

244 XBA1, 245 MAE1, 255 HINF1, 260 HINC11 HPA1, 294 HIND111, 
295 ALU1, 301 RSA1, 



Tyr 
302 TACTT 
ATGAA 



t 
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Sequence of the Synthetic Gene Fragment of pYGAlC3 

I i i i i_ 

HIND3 RSA1 RSAl SALl 

ALUl 
RSAl 



GluAlal^uTyrl^uValCysGlyGluArgGlyPhePheTyrThrProLysThitysArg 

1 GabkGCTTTGTACTTGGTTTGTGGTGAAAGAGGTTTCTTCTACACTCCAAAG ACT AAG AGA 
CIT^A^ACATGAACCAAACACCACTTTCTCCAAAGAAGATGTGAGGTTTCTGA^TCTCT 



2 HIND3 / 3 ALUl, 9 RSAl, 

GlvIleValGluGlnCysCysThrSerlleCysSerLeuTyrGlnLeuGluAsnTyrCys 

6 1 GGTATTGTTGAACAATGTTGTACTTCTATTTGTTCTTTGTACCAATTGGAAAACTACTGT 
CCATAACAACTTGTTACAACATGAAGATAAACAAGAAACATGGTTAACCTTTTGATGACA 

80 RSAl, 99 RSAl, 

AsnOC AM , 
121 AACTAATAGCGTCGjrCGAC 
TTGATTATCGCAGCAGClp 

134 SALl, 



181 
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FIGURE 4 
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FIGURE 6 
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FIGURE 7 
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